-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Squashing port-to-postgresql further #1199
Conversation
4f5d7cf
to
8126a74
Compare
310fb24
to
38683f0
Compare
1f8ea36
to
5932dfc
Compare
This removes the mongo configuration and migrates us to PostgreSQL.
This removes the MongoDB configuration and adds PostgreSQL.
These won't be needed after migration to PostgreSQL.
09c79c1
to
18e2bd2
Compare
This adds migrations which construct the basics of a database in PostgreSQL and dumps the schema. Remove mongo_instrumentation and setup in tests.
Added bonus - we can now use PostgreSQL's JSONB search abilities to simplify the previously very-verbose `content_items(term)` query.
These can be simplified after migration to PostgreSQL.
Resolve issue with before/after comparison failing in should_register_routes?
ContentItem.create_and_replace had some required behaviour that didn't seem to be tested for - such as exactly what to do with an existing item on the given base_path. Made sure that this behaviour is maintented in this branch, and added explicit specs to test this.
This will allow us to cross-reference PostgreSQL records to MongoDB records post-migration if needed
Support import of doubly-nested mongo date fields Add field mappings for ScheduledPublishingLogEntry and PublishIntent Add mongo_id field to user & scheduled_publishing_log_entry Add `rails_timestamp` method to remove conflicts with ActiveRecord behaviour when doing .insert with some-but-not-all values given Add support for batch_size in JsonImporter
This will allow us to perform side-by-side performance comparisons of the Mongo and Postgres content-stores on the same hardware (e.g. local dev laptop) and prove that the PostgreSQL content-store is at least as performant as the Mongo version.
This improves response times to around 30% of previous values
These will no longer be needed after migration to PostgreSQL.
Some records in the MongoDB have nil values in `created_at` or `updated_at`. ActiveRecord's `timestamps` migration method by default creates these fields without allowing nil values, so we must explicitly add support for this after the fact.
Some records in the old MongoDb have `description` as a simple value, some have it as a Hash. We need to support both, and make sure that we only wrap the given value in a Hash if it's not already like that.
This fixes a bug where unpublished redirects in short URL manager aren't removed from the content store (so continue to work on the website). Users of the content-store API (i.e. publishing-api) might make API calls with values they want to reset provided as `nil`. For example, if you wanted to clear some redirects on a content item, you might do something like: ``` PUT /content/some-made-up-url { ... "redirects": nil ... } ``` The intent of the user of the API is clear here - they want no redirects. However, ContentItem has a default value for redirects: ``` field :redirects, type: Array, default: [] ``` And the rest of the content-store expects this value to be an Array, not to be nil. By passing in potentially nil values in assign_attributes we allow a situation where fields that content-store expects not to be nil (because they have defaults), can be nil. This tends to result in NoMethodErrors, such as this one: ``` NoMethodError undefined method `map' for nil:NilClass redirects = item.redirects.map(&:to_h).map(&:deep_symbolize_keys) ^^^^ /app/app/models/route_set.rb:30:in `from_content_item' /app/app/models/content_item.rb:215:in `route_set' /app/app/models/content_item.rb:225:in `should_register_routes?' /app/app/models/content_item.rb:193:in `register_routes' /app/app/models/content_item.rb:33:in `create_or_replace' /app/app/controllers/content_items_controller.rb:32:in `block in update' ``` I can't think of any valid reason for overriding default attributes with nils, so it feels like calling .compact is the right thing to do here.
The `.any?` method, when called on a Relation, seems to instantiate the objects in the resultset, which is very slow on Kubernetes. It worked fine in Mongoid, but not in ActiveRecord. If we replace this with `.count.positive?`, it's much faster.
Whitehall doesn't do much validation of a given scheduled publishing time. As a result, it can sometimes send us really extreme values for `scheduled_publishing_delay_seconds` (e.g. 400 years into the future). This can cause problems in the importer when Mongo has accepted the value, but PostgreSQL can't. Changing the field type to `bigint` fixes the issue.
...it will fail due to the read-only filesystem in prod. Also, some whitespace-only corrections to the migration
It turns out that you can't call `LogStasher.add_custom_fields` twice. Because Content Store's custom fields config was being run after the govuk_app_config gem's own custom fields config, the gem's config was being overwritten. So, fields like `govuk_request_id` and `varnish_id` weren't appearing in Content Store's controller request logs (but `govuk_dependency_resolution_source_content_id` was). The gem (version 9.7.0) now provides a mechanism for setting custom fields that doesn't overwrite the gem's own settings https://docs.publishing.service.gov.uk/repos/govuk_app_config.html#logger-configuration: ``` GovukJsonLogging.configure do add_custom_fields do |fields| fields[:govuk_custom_field] = request.headers["GOVUK-Custom-Header"] end end ```
An earlier commit on main (#d5422b46 in PR #1136) fixed a subtle issue when overriding default values with nil, by explicitly setting `.created_at` and other attributes from the existing item when it was being replaced. This caused issues on this PostgreSQL branch after rebasing, as ActiveRecord behaves more as expected with respect to `created_at` and therefore the line creating a local `created_at` variable had been removed. This commit reintroduces that variable, and tests now pass again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks great to me! A heroic effort migrating to PostgreSQL.
I've run rubocop after each commit, to reduce the number of diffs in later commits and rebased against main.
I can't approve the PR as I opened it. So my suggestion would be to push the test
branch to a better named branch, then open another PR (titled "Port application to PostgreSQL", or similar), which I can approve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a read through this commit-by-commit (thanks for cleaning them up), and it all looks good to me.
I had one extremely minor nitpick comment about a comment, but other than that I'm happy for this to merge. Approving so you can choosed to merge this PR if you like, but if you want to open another PR with a better branch name that's fine too - let me know and I'll approve that too.
app/models/content_item.rb
Outdated
@@ -59,12 +63,10 @@ def self.find_by_path(path) | |||
::FindByPath.new(self).find(path) | |||
end | |||
|
|||
# We want to force the JSON representation to use "base_path" instead of | |||
# "_id" to prevent "_id" being exposed outside of the model. | |||
# We want to force the JSON representation to use "base_path" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 'We want to force the JSON representation to use "base_path"' still accurate? We've removed the confusing hash["base_path"] = hash.delete("_id")
bit now.
Note for the record: we've decided the best way to resolve the situation (production being built from
This will allow us to maintain all the existing links & references to #1085 in Github comments / commit messages / documents on Google drive / Trello cards, etc as the primary source of truth and history. |
This branch was created from
port-to-postgresql-squashed
(PR #1196, a partially-cleaned version of #1085 ) yesterday, and squashed down to 39 commits by @brucebolt and @JonathanHallam. However, the commit messages still showed lots of artefacts from previous messy rebases - duplicated messages, or messages that no longer accurately described the changes.I (@aldavidson) went through all the commits this today, editing & rewording messages to create a cleaner history, and I also found a few more commits that could be squashed together. It's now ready for re-review.
This application is owned by the publishing platform team. Please let us know in #govuk-publishing-platform when you raise any PRs.
Follow these steps if you are doing a Rails upgrade.